import json
import pandas as pd
import urllib.request
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
The API which I have chosen for this project is the Earthquake Catalog provided by the United States Geological Survey.
This API allows the user to create lots of different searches and queries from eqarthquake data collected from both the present day and the past. There are a vast array of different parameters which I used below in order to find out very specific information.
Fortunately, the information provided by the query function found in this API is outputted in CSV format. This made it very easy to manipulate the data using Pandas.
The API Documentation may be found here: https://earthquake.usgs.gov/fdsnws/event/1/
This API is free to use, and as such no API Key was needed.
Fortunately for myself, the data provided by this API comes in a very easy to work with format. I have chosen to use a CSV file for the collection of this data, as it can be very easily transferred to a panda dataframe through the use of Panda's inbuilt functions.
For this project, I will be working two data sets for the purpose of analysing them. The first such dataset is the earthquakes from 1st September 2019 to the 30th September 2019. This data will be collected from all of the earthquakes from the month, regardless how small. This will be a very large dataset, however it should give us an insight into certain metrics behind earthqaukes from location to magnitude to depth.
We will call this dataset - "monthset"
monthset = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2019-9-01&endtime=2019-9-30")
print("monthset; Input complete")
print("monthset; monthset contains: " + str(len(monthset)) + " entries")
The second dataset that we will introduce for this project is every earthquake in the last 20 years with a minimum magnitude of 5.5. This is due to the fact that according to http://www.geo.mtu.edu/UPSeis/magnitude.html, this is the threshhold where an earthquake may cause slight damage to buildings and other structures.
Unfortunately this had to be done in a somewhat crude manner as it had to be downloaded in seperate parts. The data is concatenated inside the loop which results in a full data set which we can use to analyse
# First we create initial
yearset = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=1999-09-01&endtime=" + str(2004)+"-08-31&minmagnitude=5.5")
for i in range(2009,2020,5):
temp = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime="+str(i-5)+"-09-01&endtime=" + str(i)+"-08-31&minmagnitude=5.5")
yearset = pd.concat([yearset,temp])
print("yearset; Input complete")
print("yearset; yearset contains: " + str(len(yearset)) + " entries")
We can then sort these values by time
yearset = yearset.sort_values(by='time')
print(yearset)
First of all we will extract some simple data from this initial set. I would like to find out what the maximum magnitude earthquake which occured over this time period
monthset['mag'].max()
We will then do the same for the second set. Over a much larger time period, we would expect a larger maximum
yearset['mag'].max()
One of the main issues with these datasets with regards to analysing the data is that the time is currently stored in string format. To make it easier to analyse this data going forward it is necessary to convert it now rather than later for each individual analysis. We will first do this on the monthset, followed by the yearset.
As the monthset is all contained in the same month and year, we have no need to add these columns
# Code to switch the time given by the API into a month format
print("monthset; Converting String Date and Time into Day and Date columns")
day = []
list = []
datelist = []
daylist = []
# Split up the time format
for i in range(0,len(monthset.index)):
splitup = monthset.iloc[i]['time'].split('-')
list.append(splitup)
# Split up based on 'T'
for i in range(0,len(monthset.index)):
splitup1 = monthset.iloc[i]['time'].split('T')
datelist.append(splitup1)
# Split into date
date = [x[0] for x in datelist]
timelist = [x[2] for x in list]
# Split up time format based on 'T' in order to access the day
for i in range(0,len(timelist)):
daysplit = timelist[i].split('T')
daylist.append(daysplit)
day = [x[0] for x in daylist]
for i in range(0,len(day)):
day[i] = int(day[i])
print("monthset; Converted Successfully")
We then add these columns into the dataset
monthset['date'] = date
monthset['day'] = day
print("monthset; Columns added successfully")
# Code to switch the time given by the API into a month format
print("yearset; Converting String Date and Time into Day, Month, Year and Date columns")
# Temporary, used to store values before adding to Pandas
month = []
year = []
day = []
list = []
datelist = []
daylist = []
# Split up the time format
for i in range(0,len(yearset.index)):
splitup = yearset.iloc[i]['time'].split('-')
list.append(splitup)
# Split up based on 'T'
for i in range(0,len(yearset.index)):
splitup1 = yearset.iloc[i]['time'].split('T')
datelist.append(splitup1)
# Split into date
date = [x[0] for x in datelist]
# Split into month and convert to an integer
month = [x[1] for x in list if len(list) > 3]
for i in range(0,len(month)):
month[i] = int(month[i])
# Split into year and convert to an integer
year = [x[0] for x in list if len(list) > 3]
for i in range(0,len(year)):
year[i] = int(year[i])
timelist = [x[2] for x in list]
# Split up time format based on 'T' in order to access the day
for i in range(0,len(timelist)):
daysplit = timelist[i].split('T')
daylist.append(daysplit)
day = [x[0] for x in daylist]
for i in range(0,len(day)):
day[i] = int(day[i])
print("yearset; Converted Successfully")
yearset['year'] = year
yearset['month'] = month
yearset['date'] = date
yearset['day'] = day
print("monthset; Columns added successfully")
The next step of this process is to use this data which has been collected in order for analysis. The nature and scale of data provided by this API is such that a simplified set of data might be much more efficient for data analysis and visualisation.
Some sample columns which I chose to include in this table are the date and day columns which we just split up the data into. We will also take the longitude and latitude which we will later use in order to display the data on simple maps.
We will also take the Magnitude and the Depth of the earthqauke which will be used as the main variables in our analysis of data. We will keep the location column aswell as it provides a brief description of the location of the earthquake.
simp_monthset = monthset[["date","latitude","longitude","mag","depth","place","day"]]
print("monthset; Simplified Dataframe Created")
We will now do the same with the yearset, this time making sure to add the month and year columns which we also added to the dataset
simp_yearset = yearset[["date","latitude","longitude","mag","depth","place","day","month","year"]]
print("yearset; Simplified Dataframe Created")
In order to display the data better, we will sort these events in chronological order. The index's will now be in reverse
simp_monthset = simp_monthset.sort_values(by="date")
simp_yearset = simp_yearset.sort_values(by="date")
print("; Sorted by date")
As another preprocessing measure, we will search these now simplified datasets for null values. This will help with the overall display of data, and we may find some entries which are of no value to us.
For any earthquake with a magnitude value of 0, we will delete this from the montlhy dataset as it will not help with our displaying of data.
As the 20 Year dataset accounts for a minimum magnitude already, we will not need to do this.
simp_monthset_nozero = simp_monthset[simp_monthset.mag > 0.0]
simp_monthset[1:5]
We now have two reduced datadryd which we can use to analyse the data further. The first way in which we will analyse this data is to create a scatter plot which will be used to compare the Magnitude of these earthqaukes with the day in which they occured..
Firstly, we will display this chart using a histogram, showing the distribution of the values of Magnitudees. I have chosen to use 70 bins in the analysis of this data. This is due to the fact that the highest value earthquake for this month is 6.7. This means that 70 bins will show the data roughly in steps of 0.1
plt.figure(figsize=(30, 9))
simp_monthset_nozero['mag'].hist(bins = 45)
plt.ylabel("Count",size=20)
plt.xlabel("Magnitude",size=20)
plt.title("Distribution of Magnitudes for Monthly Dataset",size=20)
plt.show()
Next up, we will analyse the second dataset also showing the distribution of magnitudes. As we have min magnitude of 5.5 and a maximum magnitude value of 9.1, we will choose 40 bins instead this time. These two graphs will be shown under different scales, but in theory shoul dbe somewhat similar in shape.
plt.figure(figsize=(30, 10))
simp_yearset['mag'].hist(bins = 40,color='red')
plt.ylabel("Count",size=20)
plt.xlabel("Magnitude",size=20)
plt.title("Distribution of Magnitudes for 20 Year Dataset",size=20)
plt.show()
The results of analysing these two datasets are very interesting. Firstly, the shape of the monthset graph is quite surprising. While it is true to say that the overall data is right-skewed, there is an interesting spike in magnitude between 4 and 5 on the Richter scale.
The Richter scale works in such a way that each increase of 1.0 in magnitude should result in about a 10 times stronger earthquake. As such it is very surprising that there is a spike in the data. It is a possibillity that this data is taken over too short of a time scale for the patterns that should have appeared to develop, but with over 15000 entries in the data this is very surprising.
The second graph is much more of what I had expected when I was analysing this data. This is a very classic right-skew graph and shows the distribution as the magnitude of the earthquakes increases. There seems to be a huge outlier in the earthquake of magnitude 9.1. Let's find out more.
simp_yearset.loc[simp_yearset['mag'] == simp_yearset['mag'].max()]
It is interesting to note that the depth of these earthquakes are quite similar. I would like to find out later on if these two characteristics of an earthquake are somehow related
For the purpose of visualising this earthquake data on a world map, I have chosen to eliminate any earthquake under a magnitude of 2.5. According to the website http://www.geo.mtu.edu/, this is the threshhold for an earthquake to be felt, and not simply to be recorded by a seismograph.
simp_monthset_greater_25 = simp_monthset[simp_monthset.mag > 2.5]
We will now quickly check to see what the minimum magnitude currently is
simp_monthset_greater_25['mag'].min()
Here is the further simplified data shown on a map of the world
# Change size of the map
plt.figure(figsize=(20, 40))
# Uses a Mollweide projection, with long 0 at position 0, and resolution 'high'
m = Basemap(projection='moll',lon_0=0,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for September")
for i in range(0,len(simp_monthset_greater_25 .index)):
long=simp_monthset_greater_25 .iloc[i]['longitude']
lat=simp_monthset_greater_25 .iloc[i]['latitude']
x,y = m(long,lat)
m.plot(x,y,"ro",markersize=4)
plt.show()
This is a very interesting diagram as it shows the distribution of the earthquakes around the world. As earthquakes occur most frequently along fault lines, it is interesting to see the amount of earthquakes which occur, for example, along the San Andreas Fault.
Another interesting way in which we can analyse this data is to map these earthquakes based on their strength. We will use a different colour code in order to display each one based upon their magnitude.
# This changes the size of the map
plt.figure(figsize=(20, 40))
# Uses a Mollweide projection, with long 0 at position 0, and resolution 'high'
m = Basemap(projection='moll',lon_0=0,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for September, with colour ranks")
for i in range(0,len(simp_monthset_greater_25 .index)):
long=simp_monthset_greater_25 .iloc[i]['longitude']
lat=simp_monthset_greater_25 .iloc[i]['latitude']
x,y = m(long,lat)
mag = simp_monthset.iloc[i]['mag']
if((mag >= 2.5) & (mag < 3.5)):
colour = "yo"
size = 4
elif((mag >= 3.5 ) & (mag < 4.5)):
colour = "go"
size = 4
elif((mag >= 4.5)& (mag < 5.5)):
colour = "bo"
size = 4
elif((mag >= 5.5)& (mag < 6)):
colour = "mo"
size = 8
elif(mag >= 6):
colour = 'ro'
size = 8
m.plot(x,y,colour,markersize=size)
plt.show()
Yellow = 2.5 - 3.5
Green = 3.5 - 4.5
Blue = 4.5 - 5.5
Magenta= 5.5 - 6.5
Red = 6.5+
Compared to the first graphic, which displays all of the earthquakes with the same rank, this diagram is very interesting. Upon first glance, many areas which had seemed to be stricken with earthquakes were in fact simply very weak earthquakes, with most barely being felt. Parallels and Meridians have been removed for easier viewing of data.
According to http://www.geo.mtu.edu/UPSeis/magnitude.html, earthquakes from 2.5 to 5.4 on the Richter Scale are usually felt, but only cause 'minor damage'. What this means with reference to the diagram which can be found above is that over the month of September, only the earthquakes labelled with a Magenta dot on the map would have caused even minor damage to buildings and other structures. These have been made significantly larger than the other earthquakes in order to highlight them on the map.
Those earthquakes which are displayed using a red dot are even stronger again, with these having the possibillity to 'cause a lot of damage in populated areas'.
This will change significantly when we look at the larger earthquakes which have occured in the past.
Before Moving on to the 20 Year Set, we will take a quick look at the distribution of these earthquakes over north America using a refrain map
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
# This changes the size of the map
plt.figure(figsize=(20, 40))
m = Basemap(width=12000000,height=9000000,projection='lcc',
resolution=None,lat_1=45.,lat_2=55,lat_0=50,lon_0=-107.)
m.shadedrelief()
plt.title("Map of North America")
for i in range(0,len(simp_monthset_greater_25 .index)):
long=simp_monthset_greater_25 .iloc[i]['longitude']
lat=simp_monthset_greater_25 .iloc[i]['latitude']
x,y = m(long,lat)
mag = simp_monthset.iloc[i]['mag']
if((mag >= 2.5) & (mag < 3.5)):
colour = "yo"
size = 4
elif((mag >= 3.5 ) & (mag < 4.5)):
colour = "go"
size = 4
elif((mag >= 4.5)& (mag < 5.5)):
colour = "bo"
size = 4
elif((mag >= 5.5)& (mag < 6)):
colour = "mo"
size = 6
elif(mag >= 6):
colour = 'ro'
size = 6
m.plot(x,y,colour,markersize=size)
plt.show()
Yellow = 2.5 - 3.5
Green = 3.5 - 4.5
Blue = 4.5 - 5.5
Magenta= 5.5 - 6.5
Red = 6.5+
It is quite interesting to note the quantity of small earthquakes which occur, especially along the mountain ranges shown by the relief map. While only the earthquakes which are coloured either magenta or red are even felt by us, the Earth underneath us, which formed the mountains shown here is in constant motion
Firstly, we will do the same as initially with the monthset, and plot the data simply based on where they occured, ignoring magnitude
# Change size of the map
plt.figure(figsize=(20, 40))
# Uses a Mollweide projection, with long 0 at position -120
# in order to display the Pacific Ring of Fire, and resolution 'high'
m = Basemap(projection='moll',lon_0=-120,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for September")
for i in range(0,len(simp_yearset.index)):
long=simp_yearset.iloc[i]['longitude']
lat=simp_yearset.iloc[i]['latitude']
x,y = m(long,lat)
m.plot(x,y,"ro",markersize=4)
plt.show()
What is very interesting in this map, but should not really be taken as a surpise, is that by plotting the earthquakes which have occured over the last 20 years, we almost perfectly draw the plates tectonics outlines of the world.
While it makes sense that most of the earthquakes which occur would be along plate boundaries, it is very interesting that there is a much greater concentration of these earthquakes along the convergent boundaries, where plates drive against each other. By plotting the earthquakes of the last 20 years, we can easily deduce whether a plate boundary is a convergent one or not
This can be seen in the map below
from IPython.display import Image
Image("https://geology.com/plate-tectonics/plate-boundary-map-780.jpg")
# This changes the size of the map
plt.figure(figsize=(20, 40))
# Uses a Mollweide projection, with long 0 at position 0, and resolution 'high'
m = Basemap(projection='moll',lon_0=-120,resolution='h')
m.drawcoastlines()
m.fillcontinents(color='white',lake_color='white')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.drawmapboundary(fill_color='white')
plt.title("Earthquake Data for last 20 Years, with colour ranks")
for i in range(0,len(simp_yearset.index)):
long=simp_yearset.iloc[i]['longitude']
lat=simp_yearset.iloc[i]['latitude']
x,y = m(long,lat)
mag = simp_yearset.iloc[i]['mag']
if((mag >= 5.5) & (mag < 7)):t
colour = "go"
size = 2
elif((mag >= 7) & (mag < 8)):
colour = "mo"
size = 9
elif(mag >= 8):
colour = 'ro'
size = 9
m.plot(x,y,colour,markersize=size)
plt.show()
Green = 5.5 - 7
Magenta = 7 - 8
Red = 8+
For this example I chose to use slightly less colours in order to display the data. Any earthquake which is coloured Magenta is considered a major earthquake. These cause serious damage. There are estimated to be around 20 of these each year.
Earthquakes coloured in red are considered great earthquakes. They can totally destroy communities near the epicentre. It is again very interesting to note just how powerful the earthquakes are around the pacific ring of fire
Like we did previously with the monthly set, I have chosen to further bring up a relief map of South America to show these earthquakes with reference to terrain which is on the ground. As earthquakes are a byproduct of boundary movement, we can expect to find that a lot of these earthquakes occur where the mountains have formed in the last millions of years
# This changes the size of the map
plt.figure(figsize=(10, 20))
from mpl_toolkits.basemap import Basemap
import numpy as np
import matplotlib.pyplot as plt
m = Basemap(projection='merc', lat_0 = -37, lon_0 = -71,
resolution = 'h', area_thresh = 0.1,
llcrnrlon=-85, llcrnrlat=-60,
urcrnrlon=-33, urcrnrlat=10 )
m.drawcoastlines()
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,120.,30.))
m.drawmeridians(np.arange(0.,420.,60.))
m.shadedrelief()
plt.title("Earthquake Activity, South America, Last 20 Years")
for i in range(0,len(simp_yearset.index)):
long=simp_yearset.iloc[i]['longitude']
lat=simp_yearset.iloc[i]['latitude']
x,y = m(long,lat)
mag = simp_yearset.iloc[i]['mag']
if((mag >= 5.5) & (mag < 7)):
colour = "go"
size = 2
elif((mag >= 7) & (mag < 8)):
colour = "mo"
size = 4
elif(mag >= 8):
colour = 'ro'
size = 9
m.plot(x,y,colour,markersize=size)
plt.show()
Green = 5.5 - 7
Magenta = 7 - 8
Red = 8+
In order to get a greater insight into the actual data found in this dataset, it may be useful to group the 20 year data we have acquired into groups seperated by month and also by year. This will provide valuable ionformation which we can use to visualise using the built in graphing features of pandas.
In order to do this, we will further reduce the data that we are comparing. For this, we will simply take the magnitude and the depth of each earthquake, along with the year that it occured.
We can also use this monthly data to compare the month of september 2019, and where this would fit in and compare to the data provided [By examining the data whose mag > 5.5]
First off, we will chart the yearly mean depth of the earthquakes
yeardepth = simp_yearset[['depth','year']]
yeardepth.describe()
meanyeardepth = yeardepth['depth'].mean()
meanyeardepth
We now group them by year
yeardepth = yeardepth.groupby('year').mean()
Here are the mean values:
#yeardepth
yeardepth.plot(figsize=(10, 5))
plt.xticks(np.arange(1999, 2019, step=1))
plt.yticks(np.arange(30,100,step=5))
plt.plot([1999, 2019], [meanyeardepth, meanyeardepth], color='g', linestyle='--', linewidth=2)
plt.title("Mean Yearly Depth of Earthquakes since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()
As we can see from this graph, there seems to be a lot of outliers in this set of data. The dashed green line is the overall mean for this data. We can see that the standard deviation is quite high in this sample. We will now do the same for the Magnitudes of Earthquakes and compare the results to one another
print("The standard deviation for this data is " + str(yeardepth.std()))
yearmag = simp_yearset[['mag','year']]
yearmag['mag'].describe()
meanyearmag = yearmag['mag'].mean()
yearmag = yearmag.groupby('year').mean()
#yearmag
yearmag.plot(figsize=(10, 5),color='red')
plt.yticks(np.arange(5.8, 6, step=0.02))
plt.xticks(np.arange(1999, 2019, step=1))
plt.plot([1999, 2019], [meanyearmag, meanyearmag], color='g', linestyle='--', linewidth=2)
plt.title("Mean Yearly Magnitude of Earthquakes since 1st September 1999")
plt.ylabel("Magnitude")
plt.xlabel("Time")
plt.show()
print("The standard deviation for this data is " + str(yearmag.std()))
As we can see from this graph, the mean earthquake magnitude over the last 20 years has stayed very much the same. The standard deviation for this sample is very small. It might be interesting to plot both the depth and magnitude of these earthquakes in order to see if there is any correlation between years with a higher mean depth of earthquake, and magnitude
One possible way in which we could improve this dataset for analysis would be to remove outliers in the depths and magnitudes and repeat the graphing process. This may give a clearer result.
How this works is by eliminating any value from the set if it is outside 3 Standard Deviations of the mean.
outlierdepth = simp_yearset[['depth','year']]
reduced_outlier_depth=outlierdepth[np.abs(outlierdepth.depth-outlierdepth.depth.mean()) <= (3*outlierdepth.depth.std())]
reduced_outlier_depth = reduced_outlier_depth.groupby('year').mean()
reduced_outlier_depth.plot(figsize=(10, 5),color='green')
plt.xticks(np.arange(1999, 2019, step=1))
plt.yticks(np.arange(30,100,step=5))
plt.title("Mean Yearly Depth of Earthquakes within 3 Standard Deviations from Mean since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()
print("The standard deviation for this data is " + str(reduced_outlier_depth.std()))
While the mean depth is dramatically decreased in this dataset, I feel that this is an inaccurate depiction of the earthquake data
We will now attempt to do the same type of analysis but instead sorting by month instead of year. As the method is basically the same, I will try to do this in a much shorter way
monthdepth = simp_yearset[['depth','month']]
monthmag = simp_yearset[['mag','month']]
We will now group the mean values by month
monthdepth = monthdepth.groupby('month').mean()
monthmag = monthmag.groupby('month').mean()
The mean value for the entire dataset is shown as a dashed green line on the below diagrams
monthdepth.plot(figsize=(10, 5))
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(40, 80, step=5))
plt.plot([1, 12], [monthdepth['depth'].mean(), monthdepth['depth'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Depth of Earthquakes since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()
We will do the same for the magnitudes
monthmag.plot(figsize=(10, 5),color='red')
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(5.8, 6, step=0.02))
plt.plot([1, 12], [monthmag['mag'].mean(), monthmag['mag'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Magnitude of Earthquakes since 1st September 1999")
plt.ylabel("Magnitude")
plt.xlabel("Time")
plt.show()
We will now calculate the standard deviation of both of these graphs
print("The standard deviation for Magnitude data is " + str(monthmag.std()))
print("The standard deviation for Depth data is " + str(monthdepth.std()))
As we can see from the both the data above and the graphs, there is a very small standard deviation where Magnitude is concerned. This could be attributed to the sample size of the data.
Despite this, there is still quite a large standard deviation in the depth of the earthquakes, although not quite as large as the previous data set showed. It would be very interesting to see if the two of these patterns both still hold using smaller sample or even larger sample sizes
We have already used line plots to analyse the data contained in the 20 Year Dataset. We will now quickly analyse the monthly data, in order to see how this monthly data would have fit into the model created above.
septmeanmag=simp_monthset['mag'].mean()
septmeanmag
septmeandepth=simp_monthset['depth'].mean()
septmeandepth
While these are the mean depth and mean magnitudes for the month of september, in order to compare these against our 20 year average, we will have to only choose the earthquakes with a magnitude of over 5.5.
comparison = simp_monthset[simp_monthset.mag >= 5.5]
comparison['mag'].mean()
monthmag.plot(figsize=(10, 5),color='red')
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(5.8, 6, step=0.02))
plt.plot([0, 12], [comparison['mag'].mean(), comparison['mag'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Magnitude of Earthquakes since 1st September 1999")
plt.ylabel("Magnitude")
plt.xlabel("Time")
plt.show()
As can be seen from the dashed green line on the graph, the mean magnitude for the month of September was below the monthly average for past 20 years
comparison['depth'].mean()
monthdepth.plot(figsize=(10, 5))
plt.xticks(np.arange(1, 12, step=1))
plt.yticks(np.arange(40, 120, step=5))
plt.plot([0, 12], [comparison['depth'].mean(), comparison['depth'].mean()], color='g', linestyle='--', linewidth=2)
plt.title("Mean Monthly Depth of Earthquakes since 1st September 1999")
plt.ylabel("Depth")
plt.xlabel("Time")
plt.show()
As you can see from the dashed green line drawn on the above graph, the mean value for depth over the month of September was a huge outlier
Firstly, we will check the correlation between the depth and the magnitude using the means for each year in the last 20 years, this is graphed below
yeardepthmag = simp_yearset[['depth','mag','year']]
yeardepthmag = yeardepthmag.groupby('year').mean()
yeardepthmag.plot.scatter(x="mag",y="depth",s=50,c='red')
plt.title("Scatter Plot of Magnitude vs Depth over past 20 years")
plt.show()
As we can see from this graph, there doesn't seem to be much of a correlation between these two attributes. We will calculate the correlation coefficient
yeardepthmag.corr(method ='pearson')
There seems to be a negative weak linear relationship between the the mean yearly depth and mean yearly magnitude of the earthquakes that have occured across the world in the last 20 years
We will also compare the data for the each day over the month of September 2019
monthdepthmag = simp_monthset[['depth','mag','day']]
We will group these by day in order to graph them better
monthdepthmag = monthdepthmag.groupby('day').mean()
monthdepthmag.plot.scatter(x="mag",y="depth",s=50)
plt.title("Scatter Plot of Magnitude vs Depth per day over September 2019")
plt.show()
When mean magnitude per day is graphed against depth for the month of september, there looks to be some sort of correlation between the two. To put a value on this we will again find the correlation
monthdepthmag.corr(method ='pearson')
This indicates a positive weak-medium correlation between the two factors. This is very interesting that over a shorter period of time there is more of a correlation between the two characteristics.
One thing that may be influencing these statistics is the size of the earthquakes. Lets again do this scatter plot for the earthquakes above magnitude 5.5 in this set
monthdepthmag_greater_25 = simp_monthset[['depth','mag']]
monthdepthmag_greater_25 = monthdepthmag_greater_25[monthdepthmag_greater_25['mag'] >= 5.5]
monthdepthmag_greater_25.plot.scatter(x="mag",y="depth",s=50,color='magenta')
plt.title("Scatter Plot of Magnitude vs Depth per day (mag>=5.5) over September 2019")
plt.show()
monthdepthmag_greater_25.corr(method ='pearson')
There seems to be no correlation between the two earthquake characteristics when a filter of > 5.5 magnitude is applied to the dataset.
This is an extremely interesting result, as it indicates that with every earthquake in a time period, there may be a correlation between depth and magnitude, as shown by the analysis on the month of September. When only the earthquakes above 5.5 are taken into account, like in the 20 year dataset, there is little to no correlation, which explains our results above.
With this in mind, we may be able to use a regression line in order to predict the magnitude of an earthquake based on its depth, or vice versa. As we do not have a full dataset of every earthquake, I will use the month of September as the basis for this analysis. As it is done over such a short time frame it may be slightly inaccurate, but with a larger dataset it may be much more effective
import seaborn as seab
seab.lmplot(x='depth',y='mag',data=monthdepthmag,fit_reg=True)
This is the same scatterplot found above, but with a regression line fitted to it
A way to go a step further with this project would be to collect the data from an entire year or mores worth of earthquakes and perform the same analysis as was done towards the end of this project. The scatterplots created could give an excellent insight into a possible correlation between the two earthquake characteristics.
For the final part of this project, I will quickly import the data for all of the earthquakes which occured in 2018. I will then quickly group these based on the month in which they occured. I will find the mean data for magnitude and depth and graph these, including a regression line.
Hopefully this will give us an answer to our question.
testset = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-1-01&endtime=2018-1-31")
for i in range(1,12):
if i is 3 or 5 or 7 or 8 or 10 or 12:
temp1 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
+"-01&endtime=2018-"+str(i)+"-15")
temp2 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
+"-16&endtime=2018-"+str(i)+"-31")
elif i is 4 or 6 or 9 or 11:
temp1 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
+"-01&endtime=2018-"+str(i)+"-15")
temp2 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
+"-16&endtime=2018-"+str(i)+"-30")
elif i is 2:
temp1 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
+"-01&endtime=2018-"+str(i)+"-15")
temp2 = pd.read_csv("https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2018-" + str(i)
+"-16&endtime=2018-"+str(i)+"-28")
testset = pd.concat([testset,temp1])
testset = pd.concat([testset,temp2])
print("import of " + str(i) + " completed")
print("testet; Input complete")
print("testset; testset contains: " + str(len(testset)) + " entries")
We now split the time into month data like above
month = []
list = []
# Split up the time format
for i in range(0,len(testset.index)):
splitup = testset.iloc[i]['time'].split('-')
list.append(splitup)
# Split into month and convert to an integer
month = [x[1] for x in list if len(list) > 3]
for i in range(0,len(month)):
month[i] = int(month[i])
testset['month'] = month
We create a simplified set and make sure all elements have a value >0
simp_testset = testset[['month','mag','depth']]
simp_testset = simp_testset[simp_testset.mag > 0.0]
seab.lmplot(x='depth',y='mag',data=simp_testset,fit_reg=True)
simp_testset['mag'].corr(simp_testset['depth'])
From the analysis of just a month's data, it seems that there may be a medium correlation between the magnitude of an earthquake and the depth which it occured inside the earth.
This is slightly contradicted by the data shown above from the analysis of 2018, where there is a weak correlation between the two elements. With a much larger sample size, we should now be able to give a more accurate answer to the question found above. First of all, I believe there is evidence here to prove correlation between the two characteristics.
While this may be only a weak correlation, a correlation co-effiecient of 0.3 shows that as magnitude increases, so does the depth for the earthquakes shown in this dataset. An even bigger sample size would be needed to say for definite, but I believe this is at least some evidence for the fact.
This project has been incredibly interesting to study, and would love to further explore this data in the future and see what conclusions I may extract